Appl.No. 10/689,257 

Amdt. dated August 15, 2006 

Reply to Office Action of 15 May 2006 

Amendments to the Specification: 

Please replace the current title with the following new title: 

Method of Shifting Data Along Diagonals in a Group of Processing Elements to Transpose the 
Data 

Please replace paragraph [0007] with the following amended paragraph: 

[0007] In the past, several different methods of connecting PEs have been used in a variety of 
geometric arrangements including hypercubes, butterfly networks, one-dimensional strings/rings 
and two-dimensional meshes. In a two-dimensional mesh or arrays array , the PEs are arranged 
in rows and columns, with each PE being connected to its four neighboring PEs in the rows 
above and below and columns to either side which are sometimes referred to as north, south, east 
and west connections. 

Please replace paragraph [0049] with the following amended paragraph: 

[0049] Eight host memory access registers (H) may be provided which allows for a short burst of 
four or eight bytes to be transferred into or out of the DRAM 24 for host access. Those registers 
may be multiplexed and be visible from the host memory interface 22 (see FIG. 1) as a page of 
data. More details about the PEs may be found in G.B. Patent Application No. 0221562.2 
entitled Host Memory Interface for a Parallel Processor and filed September 17, 2002, which is 
hereby incorporated by reference. 

Please replace paragraph [0052] with the following amended paragraph: 

[0052] At the edges of the array 36, the out-of-array connection is selected though a multiplexer 
to be either the output from the opposite side of the array or an edge/row register 54 or an 
edge/col. register 56. The edge registers 54, 56 can be loaded from the array output or from the 
controller data bus. A data shift in the array can be performed by loading the X register from one 
of the four neighboring directions. The contents of the X register can be conditionally loaded on 
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the AND gate of the row select and column select signals which intersect at each PE. When the 
contents of the X register is conditionally loaded, the edge registers 54, 56 are also loaded 
conditionally depending on the value of the select line which runs in the same direction. Hence, 
an edge/row register 54 is loaded if the column select for that column is set to 1 and an edge/col 
register 56 is set if the row select is set to 1 . The reader desiring more information about the 
hardware configuration illustrated in FIG. 5 is directed to G.B. Patent Application GB022 1563.0, 
entitled Control of Processing Elements in Parallel Processors filed September 17, 2002, which 
is hereby incorporated by reference , now Patent No. GB2395299 . 

Please replace paragraph [0057] with the following amended paragraph: 

[0057] Returning to FIG. 5, the PE-PE interconnect may also provide a broadcast and broadcatch 
network. Connections or buses 58 extend north to south from a column select register 59 and 
connections or buses 60 extend west to east from a row select register 61 . Also provided is row 
broadcast/broadcatch AND chain 62 and a column broadcast/broadcatch AND chain. When 
used for data broadcast or broadcatch, these connections (column buses 58 and row buses 60) act 
as if driven by open drain drivers; the value on any bit is the wire- AND of all the drivers outputs. 
Three control signals (broadcatch, broadcast and intercast) determine the direction of the buses 
as follows: 

• If broadcatch is set to 1, any PE for which the corresponding bits of the row select 
register 61 and column select register 59 are both set will drive both the row buses 
60 and the column buses 58. Note that if no PEs in a row or column drive the 
bus, the edge register at the end of that row or column will be loaded with 0 x FF. 

• If broadcast is set to 1, the row bus 60 is driven from the row select register 61 
and the column bus 58 is driven from the column select register 59 and any PE for 
which the corresponding bits of the row select register 61 and column select 
register 59 are both set will be loaded from one of the row or column inputs, 
according to which is selected. 

• If intercast is set to 1, any PE in which its A register is 1 will drive its output onto 
its row bus 60 and column bus 58 and any PE for which the corresponding bits of 
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the row select register 61 and column select register 59 are both set will be loaded 
from one of the row buses 60 or column buses 58, according to which is selected. 

Please replace paragraph [0066] with the following amended paragraph: 

[0066] All X values are passed through the PE; the required output value is conditionally loaded 
once it has arrived in the PE. The conditional loading can be done in various ways. e.g. by using 
any PE registers except X, Rl, or R2. An example is shown below. 
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• At time T+0: The X register reads data fefm from the X register on the PE to the East. 
This shifts data to the left (or West). 

• At time T+l : The Rl register unconditionally reads the data off the shift network (X 
register) 

• At time T+2: The R0 register conditionally loads the data from Rl. (i.e. if <cond>=l). 
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